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Abstract 

Experimental observations suggest that proteins follow different pathways under different environ- 
mental conditions. We perform molecular dynamics simulations of a model of the SH3 domain over a 
broad range of temperatures, and identify distinct pathways in the folding transition. We determine the 
kinetic partition temperature — the temperature for which the SH3 domain undergoes a rapid folding 
transition with minimal kinetic barriers — and observe that below this temperature the model protein 
may undergo a folding transition via multiple folding pathways. The folding kinetics is characterized 
by slow and fast pathways and the presence of only one or two intermediates. Our findings suggest the 
hypothesis that the SH3 domain, a protein for which only two-state folding kinetics was observed in pre- 
vious experiments, may exhibit intermediates states under extreme experimental conditions, such as very 
low temperatures. A very recent report (Viguera et al., Proc. Natl. Acad. Sci. USA, 100:5730-5735, 
2003) of an intermediate in the folding transition of the Bergerac mutant of the a-spectrin SH3 domain 
protein supports this hypothesis. 
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INTRODUCTION 



our results in light of recent experimental evidence. 



Recent experimental studies indicate that several 
proteins exhibit simultaneously a variety of interme- 
diates and folding pathways. Kiefhaber 1 identified 
at low denaturant concentration a fast pathway (50 
ms) in the folding of lyzosyme with no intermediates 
and a slow phase (420 ms) with well-populated inter- 
mediates. Choe et al. 2 observed the formation of a 
kinetic intermediate in the folding of villin 14T upon 
decreasing the temperature, and Silverman et al. 3 ob- 
served the extinction of a slow phase in the folding 
of the P4-P6 domain upon changes in ion concentra- 
tion. Kitahara et al. studied a pressure-stabilized in- 
termediate of ubiquitin, identified as an off-pathway 
intermediate in previous kinetics experiments at basic 
conditions 5 . All these studies suggest that environ- 
mental conditions favor some folding pathways over 
others. 

Major theoretical efforts in the study of protein 
folding 6-16 have focused on small, single domain pro- 
teins 17 . It is found in experiments 17, 18 that these 
proteins undergo folding transition with no accumu- 
lation of kinetic intermediates in the accessible range 
of experimental conditions. However, other kinetics 
studies of two-state proteins 19-22 suggest the pres- 
ence of short-lived intermediates that cannot be di- 
rectly detected experimentally. Recently, Sanchez et 
al. 23 explained the curved Chevron plots — the non- 
linear dependence of folding and unfolding rates on 
denaturant concentration 24-26 — of 17 selected pro- 
teins by assuming the presence of an intermediate 
state. Led by these studies, we hypothesize that sin- 
gle domain proteins may exhibit intermediates in the 
folding transition under suitable environmental con- 
ditions. 

To test our hypothesis, we perform a molecular 
dynamics study of the folding pathways of the c-Crk 
SH3 domain 27-29 (PDB 29 access code lcka). The SH3 
domain is a family of small globular proteins which 
has been extensively studied in kinetics and thermo- 
dynamics experiments 18,30-37 . We select the c-Crk 
SH3 domain (57 residues) as the SH3 domain rep- 
resentative and perform molecular dynamics simula- 
tions over a broad range of temperatures. We de- 
termine the kinetic partition temperature 12, 38 Tkp 
below which the model protein exhibits slow folding 
pathways and above which the protein undergoes a 
cooperative folding transition with no accumulation 
of intermediates. Below Tkp, we study the presence 
of intermediates in the slow folding pathways and re- 
solve their structure. We find that one of the inter- 
mediates populates the folding transition for temper- 
atures as high as Tkp- We discuss the relevance of 



RESULTS 

The SH3 domain is a /3-sheet protein (upper trian- 
gle of Fig. la). Our previous thermodynamic studies 6 
of the c-Crk SH3 domain revealed only two stable 
states at equilibrium conditions: folded and unfolded. 
Both states coexist with equal probability at the fold- 
ing transition temperature, Tp = 0.626, at which the 
temperature dependence of the potential energy has 
a sharp change, and the specific heat has a maximum 
(experimentally 18 , this temperature corresponds to 
67°C). Thus, our model reproduces the experimentally- 
determined thermodynamics of the SH3 domain 18, 30, 31 

Initial Unfolded Ensemble 

Our initially unfolded ensemble consists of 1100 
protein conformations that we sample from a long 
equilibrium simulation at a high temperature T = 
1.0 at equal time intervals of 10 4 time units (t.u). 
This time separation is long enough to ensure that the 
sampled conformations have low structural similarity 
among themselves. We calculate the frequency map 
— the plot of the probability of any two amino acids 
forming a contact — of this unfolded ensemble (lower 
triangle of Fig. la). At T = 1.0, only nearest and 
next nearest contacts have high frequency, and the 
frequency decreases rapidly with the sequence sepa- 
ration between the amino acids. 

When we quench the system from T = 1.0 to a 
target temperature, T targ et (see Methods), the sys- 
tem relaxes in approximately 1500 t.u. Due to the 
finite size of our heat bath, the heat released by the 
protein upon folding increases the final temperature 
of the system by 0.03 units above T targo t. After re- 
laxation, the protein stays for a certain time in the 
unfolded state, then undergoes a folding transition. 
During this time interval, the protein explores un- 
folded conformations, and we calculate the frequency 
map of the unfolded state for different target temper- 
atures. 

At Ttarget = Tp, the secondary structure is unsta- 
ble (Fig. lb), with average frequency / = 0.24 (see 
Methods). Successful folding requires the coopera- 
tive formation of contacts throughout the protein in 
a nucleation process ' . At Tt ar get — 0.54, the sec- 
ondary structure is more stable (Fig. lc, / = 0.50). 
Thus, the conformational search for the native state 
(NS) is optimized by limiting the search to the for- 
mation of a sufficient number of long range contacts. 
At 

Ttargot — 0.33, the lowest temperature studied, 
secondary structure elements form during the rapid 
collapse of the model protein in the first 1500 t.u. 
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(Fig. Id, / = 0.73). During collapse, some tertiary 
contacts — contacts between secondary elements — 
may also form. The formation of these contacts prior 
to the proper arrangement of secondary structure el- 
ements may lead the protein model to a kinetic trap. 
Finally, folding proceeds at this temperature through 
a thermally activated search for the NS. 

Kinetic Partition Temperature 

In order to determine the temperature below which 
we can distinguish fast and slow folding pathways, 
we compute the distribution of folding times p(tF, T) 
(Fig. 2a-e), as well as the average (tp) (Fig. 2f) and 
standard deviation of- The ratio r(T) = (£f)/cf 
measures the average folding time in units of the 
standard deviation of- This quantity is particularly 
useful when the value of the standard deviation cor- 
relates with the value of the average as we change 
^target- For instance, single-exponential distributions 
e -tp/(tF) /(t F ) have r = 1. 

We expect r — > 1 for Target > Tp, because at 
these high temperatures the folding transitions be- 
come rare events and are single-exponential distributed. 
As we decrease T tar get) we expect r > 1 just below Tp, 
because the folded state becomes more stable than 
the unfolded state, and the folding transitions are 
favored. Distributions with r > 1 indicate a narrow 
distribution centered in (tp), so that most of the sim- 
ulations undergo a folding transition for times of the 
order of the average folding time. However, if we con- 
tinue decreasing T tar g e t, we expect some folding tran- 
sitions to be kinetically trapped, and the folding time 
distribution will spread over several orders of magni- 
tudes. Such distributions have r < 1. Thus, there is 
a temperature below Tp where the maximum of r(T) 
occurs, and which signals the onset of slow folding 
pathways. We use the maximum of r(T) to calculate 
Tkp- 

Fig. 2g suggests that Tkp — 0.54, which corre- 
sponds to a maximally compact distribution of fold- 
ing times* (Fig. 2d). We find that the ratio ap- 
proaches one as we increase the temperature above 
Tkp, and the distribution of folding times approxi- 
mates a single-exponential distribution. In particu- 
lar, the distribution of folding times fits the single- 
exponential distribution e -*W(M / (t F ) for T = 0.64, 
the closest temperature to Tp that we study. The 
ratio r(T) decreases monotonically below Tkp, indi- 
cating that the distribution of folding times spreads 
over several orders of magnitude. This is the conse- 
quence of an increasing fraction of folding simulations 

* Assuming a linear relation between experimental and simu- 
lated temperatures and taking into account 18 that Tp = 67° C, 
we estimate T KP ~ 20° C 



kinetically trapped (Fig. 2a-b). The average folding 
time (tp) is minimal not at Tk p , but at a lower tem- 
perature T( tF j — 0.49 (Fig. 2f). At this temperature, 
we find that the protein becomes temporarily trapped 
in approximately 7% of the folding transitions. On 
the other hand, the remaining simulations undergo a 
folding transition much faster, thus minimizing (£_f). 
Interestingly, r(T^ tF ^) ~ 1.0, even though the distri- 
bution of folding times at this temperature is non- 
exponential. 

Folding Pathways 

Below Tkp, an increasing fraction of the simula- 
tions undergo folding transitions that take a time up 
to three orders of magnitude above the minimal (tp). 
In addition, (tp) increases dramatically (Fig. 2f). At 
the lowest temperatures studied, we distinguish be- 
tween the majority of simulations that undergo a fast 
folding transition (the fast pathway) and the rest of 
the simulations that undergo folding transitions with 
folding times spanning three orders of magnitude (the 
slow pathways). At the low temperature T — 0.33, 
the potential energy of the fast pathway has on av- 
erage the same time evolution of all the simulations 
at Tkp = 0.54, indicating that there are no kinetic 
traps in the fast pathway. 

For each folding simulation that belongs to the 
slow pathways, we sample the potential energy at 
equal time intervals of 100 t.u. until folding is finished 
(see Methods). Then, we collect all potential energy 
values and construct a distribution of potential ener- 
gies. We find that below T = 0.43, the distribution is 
markedly bimodal (Fig. 3a) . The positions of the two 
peaks along the energy coordinate do not correspond 
to the equilibrium potential energy value of the folded 
state (Fig. 3b). Therefore we hypothesize the exis- 
tence of two intermediates in the slow pathways. We 
denote the two putative intermediates as I\ and I2 for 
the high energy and low energy peaks, respectively. 
As temperature decreases, the peaks shift to lower 
energies, but the energy difference between the two 
peaks, approximately six energy units, remains con- 
stant (Fig. 3b). A constant energy difference implies 
that the two putative intermediates differ by a spe- 
cific set of native contacts. As temperature decreases, 
other contacts not belonging to this set become more 
stable and are responsible for the overall energy de- 
crease. At T = 0.33, we record the distribution of 
survival times for both intermediates and find that 
they fit a single-exponential distribution, supporting 
the hypothesis that each intermediate is a local free 
energy minima and has a major free energy barrier 
(Fig. 3c). 
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To further test the single free energy barrier hy- 
pothesis, we select a typical conformation represent- 
ing intermediate I2 and perform 200 folding simula- 
tions, each with a different set of initial velocities for 
a set of temperatures in the range 0.33 < T < 0.52. 
For each simulation, we record the time that the pro- 
tein stays in the intermediate and find that the aver- 
age survival time fits the Arrhenius law for temper- 
atures below T = 0.44 (Fig. 3d). This upper bound 
temperature roughly coincides with the temperature 
T = 0.43 below which I 2 becomes noticeable in the 
histogram of potential energies (Fig. 3a). This result 
indicates that the free energy barrier to overcome in- 
termediate I 2 becomes independent of temperature 
for low temperatures, or analogously, that the same 
set of native contacts must form (or break) to over- 
come the intermediate. 

Next, we determine the structure of the two in- 
termediates. For each intermediate, we randomly se- 
lect three conformations and find that they are struc- 
turally similar. Conformations belonging to interme- 
diate l\ have a set of long-range contacts (Ci) with a 
high occupancy and a set of long-range contacts (C2) 
with no occupancy at all (Fig. 3e). Contacts in Ci 
represent a /3-sheet made by three strands: the two 
termini and the strand following the RT-loop, which 
we name strand "A" (see I\ in Fig. 4). Contacts in 
C2 represent the base of the n-Src loop and the con- 
tacts between the RT-loop and the distal hairpin (see 
I2 in Fig. 4). In addition, I\ has a set of medium- 
range contacts (C3) with high occupancy (Fig. 3e) 
representing the distal hairpin and a part of the n-Src 
loop. For a slow folding transition, the /3-sheet (Ci 
contacts) forms in the early events and strand "A" 
can no longer move freely. This constrained motion 
prevents strand "A" from forming contacts with one 
of the strands of the distal hairpin, which we name as 
strand "B" (see I 2 in Fig. 4). Similarly, strand "B" 
cannot move freely because it is a part of C3. The 
missing contacts between strand "A" and strand "B" 
are the contacts that form the base of the n-Src loop. 

Conformational changes leading the protein away 
from intermediate I\ involve either dissociation of the 
/3-sheet, thus breaking some contacts of Ci, or disso- 
ciation of the distal hairpin, thus breaking some con- 
tacts of C3. We find that the latter dissociation may 
lead the protein conformation to intermediate I2 ■ In- 
termediate I2 has contacts of Ci, but lacks the set 
of contacts (C4) that form the base of the the distal 
hairpin (see NS in Fig. 4). 

Once we identify the structure of the intermedi- 
ates, we investigate whether intermediate I\ is present 
at larger temperatures when no distinction can be 
made concerning fast and slow folding pathways. To 



test this hypothesis, we sample the protein conforma- 
tion during the folding transition at equal time inter- 
vals of 60 t.u. for each of the 1100 simulations, and 
compare these conformations to intermediate I\ with 
a similarity score function (see Methods). For each 
folding transition, we record only the highest value of 
the similarity score, thus obtaining 1100 highest score 
values. At Tkp, the histogram of the highest scores is 
bimodal, with 25% of the folding simulations passing 
through intermediate I\ (Fig. 3f). We find that at 
Tkp, simulations that undergo the folding transition 
through Ji show kinetics of folding no different than 
those of the rest of simulations. 

DISCUSSION 

It was shown 6 that the simplified protein model 
and interaction potentials that we use here repro- 
duced in a certain range of temperatures the 
experimentally-determined two-state thermodynam- 
ics of the SH3 domain 18 . The qualitative predictive 
power of the model encouraged us to study the folding 
kinetics in a wider range of temperatures. From our 
relaxation studies of the initial unfolded ensemble, 
we observe that the structure of the unfolded state is 
highly sensitive to T targ ot. The role of the unfolded 
state in determining the folding kinetics has already 
been pointed out in recent experimental and theo- 
retical studies 39-42 . We observe nucleation 6 , folding 
with minimal kinetic barriers, and thermally acti- 
vated mechanisms for the different observed unfolded 
states. 

In previous studies, various methods have been 
developed to determine the temperature that signals 
the onset of slow folding pathways. Socci et al. 43 ' 44 
determined a glass transition temperature, T g , at which 
the average folding time is half way between i m j n and 
imax, where t min is the minimun average folding time 
and i max is the total simulation time. This method 
is sensitive to the a priori selected i max . The au- 
thors varied i max in the range 0.27 x 10 9 < i max < 
0.960 x 10 9 , and they found a 10% error in the cal- 
culation of T g . Also, Gutin et al. 45 estimated a crit- 
ical temperature, T c , at which the temperature de- 
pendence of the equilibrium potential energy leveled 
off. From their results, one can evaluate a 20% error 
in their calculation of T c . Both T g and T c are tem- 
peratures that authors use to characterize the onset 
of multiple folding pathways. In our study we use 
Tkp, which signals the breaking of time translational 
invariance of equilibrium measurements for temper- 
atures below this value 46 . We estimate a 2% error 
in our calculation of Tkp from uncertainties in the 
location of Tkp in Fig. 2g. 
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At Tft-p, secondary structure elements are par- 
tially stable, and the search for the NS reduces to the 
formation of tertiary contacts. Furthermore, Tkp is 
a relatively high temperature that prevents the stabi- 
lization of improper arrangements of the protein con- 
formation, thus minimizing the occurrence of kinetic 
traps. Below Tkp, the model protein exhibits two in- 
termediates with well-defined structural characteris- 
tics. This modest number of intermediates is a direct 
consequence of the prevention of non-native contacts. 
This prevention reduces dramatically the number of 
protein conformations. Furthermore, since a low en- 
ergy value implies that most of the native interactions 
have formed, there are few conformations having both 
low energy and structural differences with the NS 11 . 

It is found experimentally 1-3, 47-54 that proteins 
exhibit only a discrete set of intermediates. Even 
though in real proteins amino acids that do not form 
a native contact may still attract each other, exper- 
imental and theoretical studies confirm that native 
contacts have a leading role in the folding transi- 
tion. Protein engineering experiments 33 ' 55-58 show 
that transition states in two-state globular proteins 
are mostly stabilized by native interactions. To quan- 
titatively determine the importance of native inter- 
actions in the folding transition, Paci et al. 59 studied 
the transition states of three two-state proteins with 
a full-atom model. They found that on average, na- 
tive interactions accounted for approximately 83% of 
the total energy of the transition states. Of relevance 
to our studies of the SH3 domain are the full-atom 
study 60 and the protein engineering experiments 33 ' 3 
showing that the transition state of the src-SH3 do- 
main protein is determined by the NS. On the other 
hand, evidence exists that in some proteins, non- 
native contacts are responsible for the presence of 
intermediates. In their study of the homologous Im7 
and Im9 proteins, Capaldi et al. 61 identified a set of 
non-native interactions responsible for a intermediate 
state in the folding transition of Im7 protein. Mirny 
et al. 62 performed Monte Carlo simulations of two 
different sequences with the same NS in the 3x3 
lattice. One sequence presented a series of pathways 
with misfolded states due to non-native interactions. 

We investigate the kinetics of formation of the 
two intermediates in a wide range of temperatures. 
At low temperatures, simulations that undergo fold- 
ing through intermediate I\ reveal that contacts be- 
tween the two termini form earlier than the contacts 
belonging to the folding nucleus 6 ' 7 . This result co- 
incides with an off-lattice study 63 of a 36-monomer 
protein by Abkevich et al. In this study, the au- 
thors found an intermediate in the folding transition 
of their model protein. Inspection of the interme- 



diate revealed no nucleus contacts, but a different 
set of long-range contacts already formed. In addi- 
tion, we learned of the work by Viguera et al. 64 af- 
ter completion of our study. They reported that a 
mutant of the a-spectrin SH3 domain undergoes a 
folding transition through one intermediate. The au- 
thors observed that the newly-introduced long-range 
contacts had already been formed in the denaturated 
state, preceding the formation of the transition state 
of the protein. Thus, environmental conditions that 
favor stabilization of long-range contacts other than 
the nucleus contacts may induce intermediates in the 
folding transition. 

Alternatively, short-range contacts in key posi- 
tions of the protein structure may also be responsible 
for slow folding pathways. After completion of our 
study, Karanicolas et al. 65 ' 66 reported their studies 
on the Go model of the forming binding protein WW 
domain. The authors found a slow folding pathway in 
the model protein, and a cluster of four short-range 
native contacts that are responsible for this pathway. 
However, the authors observed that it was the ab- 
sence, not the presence, of these native contacts in 
the unfolded state that generated bi-phasic folding 
kinetics. Thus, environmental conditions that favor 
destabilization of short range contacts may promote 
the formation of intermediate states in the folding 
transition. 

We also investigate the survival time of interme- 
diate I 2 , and find that the free energy barrier sepa- 
rating I2 from the NS is independent of temperature. 
Thus, the average survival time follows Arrhenius ki- 
netics. The value of the free energy barrier is approx- 
imately 5.85 energy units, indicating that about six 
native contacts break when the protein conformation 
reaches the transition state that separates I2 from the 
NS. At the low temperatures where intermediates I\ 
and I2 arc noticeable, thermal fluctuations are still 
large enough so that the observed survival times of 
I2 should be much smaller, if only any six native con- 
tacts were to break. Thus we hypothesize that it 
is allways the same set of native contact that must 
break in the transition I2 — > NS. Our observations 
of the transition 1\ — > I2 support this hypothesis. In 
this transition, we find that the set of contacts C4 
allways breaks. 

At Tkp, we do not detect the intermediates from 
kinetics measurements of the average folding time, 
or analogously, from the folding rate. Thus we an- 
alyze the folding transition with the similarity score 
function that tests the presence of intermediate I\. 
Then we find that this intermediate is populated in 
25% of the folding transitions. In a recent study 67 , 
Gorski et al. reported the existence of an interme- 
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diatc in the folding transition of protein Im9 under 
acidic conditions (pH = 5.5). This finding led au- 
thors to formulate the hypothesis that Im9 has an 
intermediate at normal conditions (pH — 7.0), but it 
is too unstable to be detected with current kinetic ex- 
perimental techniques. Interestingly, the homologous 
protein Im7 (60% sequence identity) undergoes fold- 
ing transition through an intermediate in all tested 
experimental conditions 61 ' 67,68 , supporting the au- 
thors' hypothesis. Thus, changes in both the environ- 
mental conditions and the amino acid sequence may 
uncover hidden intermediates in the folding transi- 
tion of a two-state protein. In addition, a detailed 
study at Tkp may reveal the intermediates. This is 
particularly useful for computer simulations, because 
simulations at low temperatures when intermediates 
are easily identifiable may require several orders of 
magnitude longer than simulations at Tkp- 

Conclusion 

We perform molecular dynamics analysis of the 
folding transition of the Go model of the c-Crk SH3 
domain in a broad range of temperatures. At the fold- 
ing transition temperature, we observe that only the 
folded and unfolded states are populated. However, 
as we decrease the temperature, parameters monitor- 
ing the folding process such as potential energy and 
root mean square distance with respect to the native 
state, rmsd, suggest the presence of intermediates. 
We determine the kinetic partition temperature Tkp 
below which we observe two folding intermediates, 1\ 
and I2, and above which we do not observe accumu- 
lation of intermediates. Below Tkp, intermediate I\ 
forms when the two termini and the strand following 
the RT-loop form a /3-sheet, prior to the formation 
of the folding nucleus. This intermediate effectively 
splits the folding transition into fast and slow folding 
pathways. Dissociation of part of the /3-sheet leads 
the protein to the native state. We also find that sta- 
bilization of this /3-sheet and subsequent dissociation 
of the distal hairpin may lead to intermediate 1^- 

The key structural characteristics of intermediate 
1\ allow us to define a similarity score function that 
probes the presence of the intermediate in a folding 
transition. We find that I\ is populated even at Tkp- 
This result suggests that one can obtain information 
regarding the existence of putative intermediates by 
studying the folding trajectories at Tkp- However, 
at this temperature, no intermediates are noticeable 
if one limits the analysis only to the distribution of 
folding times. 

We observe that the folding pathways of the model 
SH3 domain are highly sensitive to temperature, sug- 
gesting the important role of the environmental con- 



ditions in determining the folding mechanism. Our 
findings suggest that the SH3 domain, a two-state 
folder, may exhibit stable intermediates under ex- 
treme experimental conditions, such as very low tem- 
peratures. 

MATERIALS AND METHODS 

Model Protein and Interactions 

We adopt a coarse-grained description of the pro- 
tein by which each amino acid is reduced to its Qg 
atom (C Q in case of Gly). Details of the model, the 
surrounding heat bath, and the selection of struc- 
tural parameters are discussed in detail in a previous 
study 6 . The selection of the set of interaction pa- 
rameters among amino acids is of crucial importance 
for the resulting folding kinetics of the model pro- 
tein n ' 12, 14 . Experimental and theoretical studies of 
globular proteins 6 - 7 ' 36 ' 57 ' 69-76 suggest that native 
topology is the principal determinant of the folding 
mechanism. Thus, we employ a variant of the Go 
model of interactions 12 ' 77-81 — a model based solely 
on the native topology — in which we prevent for- 
mation of non-native interactions, since we are solely 
interested in the role that native topology and na- 
tive interactions may have in the formation of in- 
termediates. We perform simulations and monitor 
the time evolution of the protein and the heat bath 
with the discrete molecular dynamics algorithm 82-88 . 
The higher performance of this algorithm over con- 
ventional molecular dynamics allows one to increase 
the computational speed up to three orders of mag- 
nitude. 

Frequencies and Folding Simulations 

To calculate the frequency map at T = 1.0, we 
probe the presence of the native contacts in each of 
the 1100 initially unfolded conformations. Then, we 
compute the probability of each native contact to be 
present. To calculate the frequency map at Target, we 
select one particular folding transition and we probe 
the presence of the native contacts during the time 
interval that spans after the initial relaxation and be- 
fore the simulation reaches the folding time tp. To 
compute tp,we stop the folding simulation when 90% 
of the native contacts form. Then, we trace back 
the folding trajectory and record tp when the root 
mean square distance with respect to the native state, 
rmsd, becomes smaller than 3A. We consider all pro- 
tein conformations occurring for t > tp as belonging 
to the folded state and of no relevance to the folding 
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transition. 



Similarity Score Function 

We introduce the similarity score function, S = 
(a/23)(15 — 6)/15, where a is the number of native 
contacts belonging to the set of contacts Ci, and b 
is the number of native contacts belonging to set C2 
(Fig. 3c). Ci has 23 contacts and C 2 has 15 contacts. 
If the protein is unfolded, then a « b w 0, thus S w 0. 
Similarly, if the protein is folded, then a w 23 and 
& w 15, thus 5 w again. Finally, if the protein 
adopts the intermediate 1\ structure, then a « 23 
and b w 0, thus Ssfl, 
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Figure 1: (a) Upper triangle: c-Crk SH3 domain 
contact map with 160 native contacts. The secondary 
structure elements are the clusters of contacts that 
are organized perpendicularly to the map diagonal. 
Long-range contacts between the two termini are en- 
closed in the circle, and long-range contacts between 
the RT-loop and the distal hairpin are enclosed in the 
square. Lower triangle: the frequency map for the 
initial set of 1100 unfolded conformations at T = 1.0. 
(b) Frequency map of the unfolded state at T = Tp = 
0.626. We compute the frequencies for a particular 
folding transition, whose potential energy trajectory 
we show in the inset (see Methods). Same for (c) 
T = 0.54 and (d) T = 0.33, the lowest temperature 
studied. 

Figure 2: (a-e) Histograms of folding times for se- 
lected temperatures. At T = 0.33 and T = 0.36, the 
two lowest temperatures studied, histograms have a 
maximum for long folding times (J.*), which suggests 
the existence of putative intermediates. At T = 0.33, 
a maximum in the histogram (| + ), not present at 
T = 0.36, corresponds to short lived kinetic traps. 
The distributions of folding times are unimodal at 
higher temperatures. At T = 0.54, the histogram is 
compact, and has no tail of long folding times. At 
T = 0.64, the histogram fits a single-exponential dis- 
tribution e~ tp / /(tp) for times larger than the re- 
laxation time of 1500 t.u. (dashed line). We estimate 
the errors of the histogram bars as the square root 
of each bar. (f) Average folding time versus tem- 
perature. Each dot represents the folding time for a 
particular folding transition, (g) Ratio r of the aver- 
age and the standard deviation, r = {tp)/ap, for the 
distribution of folding times. The ratio approaches 
one above Tkp and zero below Tkp- The ratio is 
maximal at Tkp, indicating a compact distribution 
of folding times at this temperature. 

Figure 3: (a) Distributions of the potential en- 
ergies of the slow folding pathways for temperatures 
below T = 0.43. The distributions are bimodal, sug- 
gesting two putative intermediates I\ and 12- (b) 
The potential energy of the distribution peaks (* 
and O) increases with temperature, but the energy 
difference between peaks remains constant. The en- 
ergy of the peaks is significantly larger than the equi- 
librium energy of the folded state (A), (c) Distri- 
butions of survival times at T — 0.33 for the high 
energy intermediate I\ (*), (tp) = 1.81 x 10 6 and 
a = 1.85 x 10 6 , and the low energy intermediate I 2 
(O), (t F ) = 2.47 x 10 6 and a = 2.43 x 10 6 , fit to 
single-exponential distributions, (d) Arrhenius-fit of 
the average survival time of intermediate I2 below 



T = 0.44. This upper bound temperature coincides 
with the temperature below which the distribution 
of the potential energies (Fig. 3a) of the slow fold- 
ing pathways becomes bimodal. (e) Upper triangle: 
Absent contacts (filled squares) and present contacts 
(crosses, "C4") in intermediate I\. Upon the tran- 
sition I\ — ► I 2 , these contacts reverse their presence 
(so that the filled squares are the present contacts and 
the crosses are the absent contacts). There are five 
more squares than crosses, which roughly accounts 
for the difference of six energy units between the two 
intermediates. Lower triangle: long-range contacts 
"Ci" are present in intermediate I\, and long-range 
contacts "C2" are absent. There are 23 contacts in 
"Ci" and 15 contacts in "C 2 ". (f) Probability that a 
folding transition at T = Tkp = 0.54 contains a pro- 
tein conformation with similarity S to intermediate 
I\ (see Methods). 

Figure 4: Schematic diagram of fast and slow fold- 
ing pathways. At T — 0.33, approximately in 15% of 
the simulations, the model protein undergoes a fold- 
ing transition through the slow folding pathways. We 
show the protein structure in I\ and I 2 using the 
secondary structural elements of the native state, al- 
though some of these elements are not formed. In 
intermediate I\ , both termini and the strand A form 
a /3-sheet (in the ellipse). The corresponding set of 
native contacts is Ci. Dissociation of the /3-sheet 
leads to rearrangements of the protein conformation 
and successful folding to the native state (NS). How- 
ever, dissociation of the distal hairpin (in red) leads 
to more localized rearrangements that may lead the 
protein to intermediate Upon I\ — ► I 2 transition, 
contacts of C2 (the two ellipses in I 2 ) form, but con- 
tacts of C4 (the ellipse in NS) break. 
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